Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve bin-packing performance #652

Merged
merged 4 commits into from
Sep 1, 2021
Merged

improve bin-packing performance #652

merged 4 commits into from
Sep 1, 2021

Conversation

bwagner5
Copy link
Contributor

Issue, if available:
#625

Description of changes:

  • Increased performance of bin-packing. This specific optimization found a loop where a terminable case was not being checked which allowed a significant number of loop iterations to be skipped.

    • The loop was responsible for packing pods on to a specific packable node. Once the node was full, the loop would continue unnecessarily. This change also adds another small optimization to the check to determine if the node is full by passing through the smallest pod from the list (it's already sorted when passed thru) and adds it to the currently reserved resources in order to see if anymore progress could possibly be made by continuing through the loop.
  • The terminable case optimization in the packing loop had a 30x improvement

  • The smaller "min-pod" optimization resulted in a 50% improvement

Benchmarks:

  • 20,000 pods previously took 91 seconds and now takes about 3 seconds and drastically reduces memory usage. NOTE: the benchmark graph "post-optimization" was ran before the minPod optimization which reduced the 20,000 pod case from around 6 seconds to 3 seconds. The benchmarks were executed on 2 m5.large instance types.

graphs

Testing:

  • Performed A/B testing w/ 1:1 input and both gave the same output. The specific test walked up cpu, memory, and pods to pack in increments of 100 milli-cores, 64mb, and 1 respectively. The upper limits were 10,000 mc, 8GB, and 90 pods respectively.
  • A/B tests on multiple node packings were also performed ad-hoc.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Aug 31, 2021

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 03613d2

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/612fc330d1a63300080bd7ee

pkg/packing/packer.go Outdated Show resolved Hide resolved
}
return result
}

// isFull checks if the reserved resources + the minimum resourced pod overflows the total resources available.
func (p *Packable) isFull(minPod *v1.Pod) bool {
Copy link
Contributor

@ellistarn ellistarn Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you can't shared logic with

func (p *Packable) reservePod(pod *v1.Pod) bool {
	requests := resources.RequestsForPods(pod)
	requests[v1.ResourcePods] = *resource.NewQuantity(1, resource.BinarySI)
	return p.reserve(requests)
}
func (p *Packable) reserve(requests v1.ResourceList) bool {
	candidate := resources.Merge(p.reserved, requests)
	// If any candidate resource exceeds total, fail to reserve
	for resourceName, quantity := range candidate {
		if quantity.Cmp(p.total[resourceName]) > 0 {
			return false
		}
	}
	p.reserved = candidate
	return true
}

As far as I understand it, this function does exactly what you need, you'd just need a dry-run mode or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're probably right that this can be reduced. Let me try to merge these. I think the dry-run would be the isFull and the reserve would still exist but do less and just call isFull and the reserve.

Copy link
Contributor Author

@bwagner5 bwagner5 Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after digging into trying to merge these two or use each other, I think it's actually making the interface of reserve significantly worse off. The reserve function is doing something slightly different and returning a different result.

The reserve function will take a resource list and reserve on the node if it doesn't overflow the node's resource. The return value signifies that the reservation succeeded (did not overflow) or failed (overflowed).

The isFull function is only concerned about the node totals and seeing they are all the way full, but not overflowed OR would overflow with the smallest pod that needs to be reserved in the list.

I think merging these functions makes it less readable, adds complexity to reserve, and the return value gets muddied since we'd need two bools returns, one indicating if the reservation succeeded and one noting if the node is full.

@@ -100,10 +101,29 @@ func (p *Packable) Pack(pods []*v1.Pod) *Result {
return result
}
result.unpacked = append(result.unpacked, pod)

if p.isFull(pods[len(pods)-1]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this check to the beginning of the loop? Unless I'm misunderstanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be moved to the top. I think it might make more sense to move after the reserve block though.

Here's how I reason in english about the func:

Try to reserve the space for the pod, if that fails, then check if the node is full and potentially exit the loop. If the node isn't full (next block) and nothing has been packed (meaning the largest pod was unable to be packed), exit the loop because we need to find a bigger node. If we've made progress, but the current node is just getting full, add the items to the unpacked list and keep trying smaller pods.

@JacobGabrielson
Copy link
Contributor

Nice!

ellistarn
ellistarn previously approved these changes Sep 1, 2021
@ellistarn
Copy link
Contributor

Amazing!

@bwagner5 bwagner5 merged commit a5db3dc into aws:main Sep 1, 2021
gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this pull request Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants